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Abstract 

Background: As high throughput sequencing continues to grow more commonplace, the need to disseminate the 
resulting data via web applications continues to grow. Particularly, there is a need to disseminate multiple versions 
of related gene and protein sequences simultaneously — whether they represent alleles present in a single species, 
variations of the same gene among different strains, or homologs among separate species. Often this is accomplished 
by displaying all versions of the sequence at once in a manner that is not intuitive or space-efficient and does not 
facilitate human understanding of the data. Web-based applications needing to disseminate multiple versions of 
sequences would benefit from a drop-in module designed to effectively disseminate these data. 

Findings: SnipViz is a client-side software tool designed to disseminate multiple versions of related gene and protein 
sequences on web sites. SnipViz has a space-efficient, interactive, and dynamic interface for navigating, analyzing and 
visualizing sequence data. It is written using standard World Wide Web technologies (HTML, Javascript, and CSS) and is 
compatible with most web browsers. SnipViz is designed as a modular client-side web component and may be 
incorporated into virtually any web site and be implemented without any programming. 

Conclusions: SnipViz is a drop-in client-side module for web sites designed to efficiently visualize and disseminate 
gene and protein sequences. SnipViz is open source and is freely available at https://github.com/yeastrc/snipviz. 



Background 

Web sites designed to disseminate data that annotate gene 
or proteins frequently also disseminate sequences for the 
respective genes or proteins. Where there is only a single 
sequence, the problem of dissemination is relatively simple. 
The sequence is displayed as plain text in its entirety using 
a fixed-width font and may optionally be formatted so that 
position number in the sequence can be easily determined. 
For example, the sequence for the protein DSN1 from 
S. cerevisiae may be displayed as the following: 

1 11 21 31 41 51 

I I I I I I 

1 MTSVTRSEII DEKGPVMSKT HDHQLESSLS PVEVFAKTSA SLEMNQGVSE ERIHLGSSPK 60 

61 KGGNCDLSHQ ERLQSKSLHL SPQEQSASYQ DRRQSWRRAS MKETNRRKSL HPIHQGITEL 120 

121 SRSISVDLAE SKRLGCLLLS SFQFSIQKLE PFLRDTKGFS LESFRAKASS LSEELKHFAD 180 

181 GLETDGTLQK CFEDSNGKAS DFSLEASVAE MKEYITKFSL ERQTWDQLLL HYQQEAKEIL 240 

241 SRGSTEAKIT EVKVEPMTYL GSSQNEVLNT KPDYQKILQN QSKVFDCMEL VMDELQGSVK 300 

301 QLQAFMDEST QCFQKVSVQL GKRSMQQLDP SPARKLLKLQ LQNPPAIHGS GSGSCQ 
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However, there is often a need to simultaneously dis- 
seminate multiple versions of related sequences. Exam- 
ples include displaying sequences for a given protein 
from many strains of a particular species of yeast, dis- 
playing multiple alleles from the same gene across a 
population, and displaying the results of a multiple se- 
quence alignment of a homologous protein across many 
species. While the above format works well for display- 
ing a single sequence, it is not well suited for simulta- 
neously displaying multiple sequences in a way that is 
easily read by humans. 

Strategies for simultaneously displaying multiple se- 
quences on the web include displaying FASTA-formatted 
data [1], where each version of the sequence is sequentially 
displayed in its entirety as plain text; or more commonly, 
displaying ClustalW- formatted data [2], where the first 
N positions of each aligned sequence is listed sequen- 
tially in some order, then the next N positions, then the 
next, until all versions of the entire aligned sequences have 
be displayed. On web pages, it is common to color code 
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ClustalW-format data based on sequence variability at 
specific positions or based on some property of nu- 
cleotides or amino acids. However, this method rapidly 
becomes cumbersome, space-inefficient, and incompre- 
hensible to human readers as more versions of the se- 
quence are included and as the sequence gets longer. 

Many programs exist for generating or viewing mul- 
tiple sequence alignment information [3-8]. Miew [6] is 
of particular note as very mature and feature-rich pro- 
gram designed specifically for the display of multiple 
sequence information. Mview supports several input 
formats, is highly configurable with regard to output, 
and supports saving the output as HTML that may 
be used to display the data in a web page. However, 
this HTML is a static display of the data that is sub- 
ject to the same limitations described above. Additionally, 
Mview requires running an external program, either in ad- 
vance or at runtime, which adds to the complexity. The 
JalviewLite applet [7] is a feature-rich Java applet that may 
be used by websites to disseminate multiple sequences 
using a dynamic interface. In this interface, the sequences 
may be presented using a sliding window, which elimi- 
nates the space and legibility problems created by static- 
ally displaying the entire sequence. However, JalviewLite 
requires that the end user have Java and Java web browser 
plugins installed. As a general-purpose dissemination plat- 
form, this is less than ideal as not every user running a 
web browser has Java installed and configured and Java 
applets are not compatible with many portable devices. 
Ideally, the data would be presented using a dynamic 
interface that runs entirely within the web browser and re- 
quires no external plugins. 

Here we present Snip Viz, designed to efficiently dis- 
seminate many versions of sequences of any length on 
the web. Snip Viz makes use of standard dynamic HTML 
and JavaScript to present an interactive sliding window 
view of the aligned sequences, so that increasing se- 
quence sizes do not result in more space on the web 
page being devoted to displaying the sequences. Snip Viz 
supports hierarchical clustering of sequences, color cod- 
ing of positions, and graphical whole-sequence display 
to assist in finding and interpreting locations of se- 
quence variation. Snip Viz may be integrated into a web 
page using standard HTML without any programming. 
SnipViz is open source and freely available at https:// 
github.com/yeastrc/snipviz. 

Findings 

Implementation 
Web component 

SnipViz is implemented using standard World Wide 
Web technologies: JavaScript, AJAX, HTML, and Casca- 
ding Style Sheets (CSS). It is cross platform and has been 
tested in current versions of Chrome, Firefox, Safari, and 



Internet Explorer running on Windows, Linux, MacOS, 
and iOS. SnipViz has no server-side component-other 
than the availability over standard HTTP of the data to be 
displayed. 

SnipViz makes use of 3 rd party JavaScript libraries, in- 
cluding j Query versions 1.5.2 and up (http://jquery.com/), 
jQuery UI versions 1.7.0 and up (http://jqueryui.com/), 
the JavaScript vector graphics library (http:// www. waiter 
zorn.de/en/jsgraphics/jsgraphics_e.htm) and DHTML 
tooltips library (http://www.walterzorn.de/en/tooltip/ 
tooltip_e.htm) by Walter Zorn. 

Architecture 

The web page initializes SnipViz by indicating the loca- 
tion of the input data using standard HTML. JavaScript 
code is then executed on the end-user client that re- 
trieves the indicated data via HTTP, parses the data, 
then constructs and displays the graphical user interface 
(GUI) on the web page. Once loaded, the GUI is purely 
a client-side application and no further communication 
with the server takes place. 

Input formats 

The only required data for SnipViz are either DNA or 
protein sequence data in FASTA format. SnipViz must 
be configured with the location of the FASTA data (either 
a static file or the output of a dynamic program) that con- 
tains all of the sequences and their associated labels. 
If the sequences require alignment, the sequences must be 
aligned prior to being loaded. 

SnipViz may optionally display a dendrogram indicat- 
ing the hierarchical clustering of the sequences based on 
any property the implementer wishes (e.g., phylogeny of 
originating organisms or similarity of sequences being 
displayed). To display the dendrogram, the user must in- 
dicate the location of Newick-formatted data [9] (either 
static file or output of a dynamic program) that contains 
the desired clustering for all of the labels in the FASTA 
sequence file. 

Installation 

SnipViz is incorporated into a web page by importing 
JavaScript files and specifying data locations using stand- 
ard HTML. After importing the necessary JavaScript, the 
following is an example of the HTML necessary to con- 
figure and run SnipViz on a web site. 

<div class="snp-viewer-create-here" style="width : lOOOpx; " 

snp-viewer-dna-f asta-f ile="url_to_data_directory/gene . fa" 
snp-viewer-dna-newick-f ile="url_to_data_directory/gene . newick" 

></div> 

This HTML will result in an instance of SnipViz ap- 
pearing wherever this HTML is placed on the web page, 
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displaying the indicated data— in this case, hierarchically 
clustered DNA sequences from the indicated Newick 
and FASTA files. 

Snip Viz may be optionally configured with both DNA 
and protein sequences at the same time and will auto- 
matically provide a link for toggling between viewing the 
DNA and protein sequences. This is accomplished by in- 
dicating the locations of the DNA and protein sequences 
in the same configuration block. For example: 

<div class="snp-viewer-create-here" style="width : lOOOpx;" 

snp-viewer-dna-f as ta-f ile="ur l_to_data_di rectory /gene_dna .fa" 
snp-viewer-dna-newick-f ile="url_to_data_di rectory/ gene . newick" 
snp-viewer-protein-f asta-f ile="url_to_data_directory/ gene_protein . fa" 
snp-viewer-protein-newick-f ile="url_to_data_directory/ gene . newick" 

></div> 

Graphical user interface 
Basic functionality 

A screenshot of the Snip Viz GUI is shown in Figure 1. 
At the top of the interface is a graphical representation 
of the whole sequence with red bars indicating locations 
of sequence variation. In this representation is a dashed 
box that represents the part of the sequence currently 
being displayed below. Users may click anywhere in this 
sequence representation to center the currently-viewed 
window on that location, or click and drag the dashed 
box to move the window. 



Beneath the whole sequence representation is the dis- 
play of the sequences, themselves. To the left are the la- 
bels supplied for the sequences from the FASTA file 
and, optionally, the dendrogram representing the hier- 
archical clustering of the sequences from a Newick file. 
To the right the section of the sequences corresponding 
to the currently-viewed window are displayed. Beneath 
the sequences, users may click the arrows to page left or 
right through the sequences. 

Sequence highlighting 

Users may click the labels to toggle highlighting of one 
or more sequences. (Figure 2) If highlighted, a sequence 
will have its colors inverted to emphasize the sequence 
and make sequence variation among highlighted se- 
quences easier to discern. If more than one sequence is 
highlighted, the indicators of sequence variation (in the 
whole-sequence representation above the sequences and 
the column highlighting in the sequence window) will 
only indicate variation among the highlighted sequences. 

Very long sequences 

Because the dashed box in the whole-sequence win- 
dow represents the currently-viewed segment of the se- 
quence, and because that segment is a fixed size (e.g., 50 
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Figure 1 A screen capture of SnipViz displaying the DNA sequence of a gene for 22 different strains of S. cerevisiae. The top rectangle is 
a graphical representation of the whole sequence (1,731 nucleotides long) that serves as a whole-sequence navigation bar. The dashed box 
indicates the currently-viewed segment of the sequence and may be clicked and dragged to the desired location in the sequence. The red bars 
indicate locations of variation in the sequence among all the strains. The bottom left displays a hierarchically clustered list of the sequence labels, 
and the bottom right displays the sequences. The blue bars highlight columns in the sequence display where variation occurs. 
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Figure 2 A screen capture of SnipViz illustrating the effect of highlighting specific sequences. In this example, the sequence for a protein 
from 22 different strains of S. cerevisiae is shown. The user has clicked the names of highlighted sequences to enable highlighting of those 
sequences. The red lines in the sequence navigation bar and blue column highlights of the sequence now indicate locations of sequence 
variability only among the highlighted sequences. 



nucleotides), the width of the dash box will change based 
on the length of the whole sequence. As the overall se- 
quence gets longer, the dashed box representing the seg- 
ment of the sequence being shown will become narrower. 
If the sequence is long enough, the box representing 50 
positions in that sequence will be so narrow that it will 
not be a useful element of the GUI. 

To solve this problem, SnipViz will detect when the 
dashed box would be too narrow and employ a second 
level graphical sequence representation (Figure 3). When 
this occurs, the top sequence representation will contain 
a dashed box that indicates which segment of the se- 
quence is represented by the sequence representation 
below it. This second sequence representation will itself 
contain a dashed box that indicates which segment of 
the sequence is being displayed in the sequence viewer 
area below. This method ensures that even extremely 
long sequences may be graphically represented and sim- 
ply navigated. 

Indicators of variation 

As previously mentioned, the whole-sequence represen- 
tation above the sequences contains red lines that serve 
as indicators of locations of sequence variation among 
all of the sequences or among the currently-highlighted 
sequences. Both the shade of red and the height of the 
line contain information. 



The height of the line indicates the number of po- 
sitions represented by that line that contain variation. 
Because the width of the graphical whole-sequence rep- 
resentation may contain fewer pixels than the number of 
positions in the sequence, each line may represent more 
than one position in the sequence. A taller line indicates 
more relative variation at the represented position in the 
sequence than a shorter line. 

The shade of red is meant to indicate the significance 
of the variation at a given position, with darker red indi- 
cating more significant variation. In the case of DNA se- 
quences, there are two shades of red: pale and dark. Pale 
red indicates that all variation in that position result in 
the same amino acid being encoded (silent mutations). 
Dark red indicates that there is at least one substitution 
at that position among all the sequences that results in a 
different encoded amino acid. In the case of protein se- 
quences, the shade of red is determined by a calculation 
using the BLOSUM 80 [10] amino acid substitution 
matrix. For all positions with sequence variation, a score 
is calculated by comparing all amino acids at that position 
against all other amino acids at that position and sum- 
ming the BLOSUM 80 substitution values. The resulting 
values are used to linearly scale the intensity of red in the 
indicator line between a pale and dark shade of red. 

The main sequence viewing area also contains indica- 
tors of location of sequence variability. These appear as 
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Figure 3 A screen capture of SnipViz illustrating the effect of viewing very long sequences; in this case 22 separate DNA sequences 
each 14,733 nucleotides in length. A second sequence navigation bar has been created at the top, such that the dashed box in the top bar 

indicates the region of the sequence shown in the second bar, and the dashed box in the second bar indicates the region of the sequence 
being displayed below. Both boxes may be clicked and dragged in their respective sequence navigation bars to display the DNA sequence at the 
desired location. 



a shade of blue that highlights specific columns in the 
sequence and appears as an indicator block above the 
column. The shade of blue is determined using the same 
logic as the shade of red in the indicator lines described 
above. 

Implementation considerations 

Although there is no limit coded into Snipviz for the 
length or number of sequences that may be simultan- 
eously viewed, there are practical limitations that should 
be considered. Increasing the length or number of se- 
quences consumes resources and places an increasingly 
large computational load on Javascript and the web 
browser when the user manipulates the interface. To de- 
termine practical limitations, the responsiveness of 
Snipviz was tested using latest versions of Chrome, Fire- 
fox, and Internet Explorer on Microsoft Windows 7 by 
varying the length (up to 120,000 positions) and number 
of displayed sequences (up to 1,000). For all three 
browsers we found that displaying up to 100 sequences re- 
sulted in acceptable performance (the length of the se- 
quences had a negligible impact). Snipviz successfully 
loaded in all of our tests (up to 1,000 sequences each with 



120,000 positions), though responsiveness of the interface 
was severely degraded by this point. Loading thousands of 
sequences is not recommended, as limitations of available 
memory start to become a significant issue that may result 
in crashing of the web browser. 

Current implementations 

To view simple demonstrations of implementations of 
SnipViz, see http://www.yeastrc.org/snipviz/. There are 
demonstrations of multiple configurations (DNA, protein, 
very long sequences, and clustering), and each demonstra- 
tion includes the HTML required to implement it. 

To see an example of how SnipViz may be integrated 
into a web application, see http://www.yeastrc.org/g2p/, 
a web application developed to support a study examin- 
ing the phenotypic consequences of sequence variation 
among 22 phylogenetically diverse strains of the budding 
yeast S. cerevisiae [11]. For this implementation, SnipViz 
is used to visualization locations of sequence variation in 
the same gene or protein across all of the sequences 
strains of yeast. For a specific example, see http://www. 
yeastrc.org/g2p/phenomeviewProtein.do?orfName=YCR 
088W&listing=ABPl+%2f+YCR088W. 
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Future directions 

Central among the future plans for Snip Viz are (1) re- 
moving the requirement that the loaded sequences be 
either DNA or protein sequences and (2) making the 
highlighting system more modular so that custom logic 
may be easily used to override the default highlighting 
system. Not only should users be able to display RNA or, 
indeed, any conceivable type of sequence, they should be 
able to easily implement some highlighting that indicates 
their own determination of significance for variability at 
specific positions. 

We welcome other developers to download the code, 
make improvements, and contribute to the project at 
https://github.com/yeastrc/snipviz. 

Conclusions 

Snipviz is a client-side web module for efficiently dis- 
playing multiple versions of DNA or protein sequen- 
ces. It has a highly dynamic, interactive graphical user 
interface written using standard World Wide Web tech- 
nologies and may be simply installed without any pro- 
gramming being necessary. It is cross-platform and 
compatible with all current web browsers. Snip Viz is open 
source and freely available at https://github.com/yeastrc/ 
snipviz. 

Availability and requirements 

Project name: Snip Viz 

Project home page: https://github.com/yeastrc/snipviz 
Operating system(s): Platform independent 
Programming language: JavaScript, HTML, CSS 
Other requirements: None 
License: Apache 2.0 

Any restrictions to use by non-academics: None 
Abbreviations 

AJAX: Asynchronous JavaScript and XML; CSS: Cascading Style Sheets; 
GUI: Graphical User Interface; H^P: HyperText Transfer Protocol; 
HTML: HyperText Markup Language; XML: Extensible Markup Language. 
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